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Abstra ct 

This paper investigates computer architecture in conjunction with the 
algorithmic structures of nonlinear f inite-eleraent analysis. To help set the stage 
for this goal, the development is undertaken by considering the wide-ranging needs 
associated with the analysis of rolling tires which possess the full range of 
kinematic, material and boundary condition induced nonlinearity in addition to gross 
and local cord-matrix material properties. 


1. Introduction 


With the advent of the finite-element method (FEM) , the analysis of large-scale 
structure is finally possible. While large-scale linear finite-element simulations 
are relatively economical, such is not the case for nonlinear situations involving 
geometric, material and boundary induced nonlinearity^ - *^ . There are numerous 
aerospace and commercial structures which require full-scale nonlinear analysis to 
enable their improved design. This includes such structural systems as gas turbines, 
space structures, aircraft structure, autos, etc. Perhaps the most commonplace of 
such structures is the tire, which serves as a component to a wide variety of 
aerospace and auto systems. 

To bypass the difficulties associated with nonlinear FE analysis, significant 
work has been channeled into two main areas, namely: 

i) The development of algorithmic improvements, element-element,^ constrained 
Newton/Raphson (NR),^ and hierarchical least squares, 7 

ii) The design of new computer architecture enabling hardware speedup, i.e., as in 

Q Q 

vector processors (Cray, Cyber 205 and true parallel machines * ) 

In the context of such thrusts, not enough effort has been undertaken to 
consider how algorithmic structures might effect machine architecture or vice versa. 

Based on the foregoing comments, this paper will investigate machine architec- 
ture in conjunction with algorithmic structure. To achieve this goal, the develop- 
ment will be undertaken by considering the wide-ranging needs associated with the 
analysis of tires. This approach was taken since, as will be seen in later sections, 
the needs of tire modeling embody essentially all the requirements of nonlinear 
continuum mechanics, namely^ 

i) Material nonlinearity 
ii) Inelastic behavior 
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iii) Large def ormation/strain kinematics 

iv) Complex inertial fields 

v) Nonlinear boundary conditions 

vi) Microstructure 

vii) Therraomechanical response 

viii) Solid fluid interaction 

All this leads to the development of what is called hierarchical substructural 
parallelism which enables bottom-up/top-down modeling.^ Overall a nonlinear 
multilevel substructuring scheme is overviewed which enables the simplification of 
the data based management (DBM) of parallel-type operators while still yielding 
enhanced computational speeds as well as reducing core requirements. 

In the sections that follow, detailed tire modeling discussions embody the 
diversity of needs of nonlinear simulations, various types of current machine 
architectures, and potentials of hierarchical substructural parallelism. Examples 
that define enhanced properties will also be given. 


2 . Shortcomin gs of FEM Vis- a -vis Tire Structu r al Analysis 

Noting Figure 1, the tire possesses a very regionalized/substructural form of 
construction. Overall it consists of: 

i) Carcass plies, steel/glass/Kevlar cord-rubber composites 

ii) Belt plies (same as above) 

iii) Bead, bundled steel cords 

iv) Thread configuration 

v) Regionalized rubber types 

vi) Belt edges, turnup plies 
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The operating environment consists of: 

i) The tire-road interface which involves varying pavement textures, 
flexibilities and resulting frictional characteristics ^ 9 ^ 

ii) The tire-rira interface 

iii) The tire-rim-suspension behavior 

iv) Cornering, braking and accelerating maneuvers 

v) Standing, steady/transient rolling^ * 13-15 

vi) Obstacle/hole envelopment roll over events^’ ^ 

vii) Pressurization ^ 9 ^ 

As seen from Figures 2 and 3, the pressurization and subsequent loading into 
standing contact can lead to large deformations and associated rotations. For 
instance Table 1 illustrates comparisons of the deflection fields generated from 
linear and nonlinear FE simulations. 

In this context, it follows that there are several sources of response 
nonlinearity, namely 

i) Large deformation kinematics 

ii) The road-tire-rim interfaces 

iii) Bimodular behavior of cord-rubber composites in transitions from 
tension to compression 

iv) Thermomechanical interactions 

v) Material nonlinearity 

vi) Local large strain levels in various regions of the tire; 
belt edges, bead region, and tread 

vii) Dynamic impact interactions 

Each of the foregoing sources of nonlinearity initiates different forms of response 
behavior. 

For instance, from a kinematics point of view, the pressurization process causes 
rotations and deflections which lead to an overall stiffening of the tire. 

Similarly, as with Hertizian contact problems, the tire-road interface also exhibits 
hardening-type properties, namely, the hub force-deflection response is stiffening in 
character as noted in Figure 4. 
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In addition to the foregoing modeling difficulties, in general the tire response 
needs to be handled in several levels, namely 

i) Cord-matrix and regionalized rubber interfaces 

ii) Whole cord-rubber plies/larainae 

iii) Full laminate structures, several plies as in belt and 
carcass laminates 

iv) Full (global) structure 

As one preceeds from (i)-(iv), a "bottom-up" modeling*! approach is required wherein 
fine detail is handled at the lowest level while the upper level models are in- 
creasingly coarser so as to reduce overall degrees of freedom in a global model. 

Once the global-level model is solved what is needed is a "top-down 1 * scheme^ to 
provide proper mechanics information at the constituent level. Such an approach is 
necessary if proper stress and strain fields are to be captured hence enabling proper 
description of internal fields. 

Current FE models of tires start from level (iii) and proceed to (iv). In this 
way, a true local-level description of mechanical fields is not possible. 


3 . Types of Parallelism 

Multiprocessor computers fall basically into two main categories, namely 

i) Vector processors (Cray, Cyber 205) 
ii) True parallel processors (Flex, Goodyear) 

Compared with single processor units (IBM 3084, CDC7600), vector processors 
enable quicker more efficient handling of matrix manipulations. This is achieved 
through the use of multiple processors which operate simultaneously on a succession 
of matrix elements. Data transfer for such operations is typically from a single 
common core storage. 

In true parallel processors, different functions/operations are performed in 
separate processors. In such machines data transfer usually involves both a common 
core as well as individual local processor cores. For such machines very high speeds 
can be realized. 

In the context of programming languages, vector processors typically can be 
programmed in enhanced versions of FORTRAN or the like. For true parallel proces- 
sors, overall programming is generally achieved at two levels. At the local pro- 
cessor level, languages such as FORTRAN can be employed. At the total system level, 
machine control language MCL is usually employed. 
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4 . 


Classical Solution Algorithm 


The solution of large-scale FE simulations typically involves either some 
variant of the Newton/Raphson scheme NR, or an explicit/implicit time integration 
procedure. For the current demonstration purposes, the presentation will concentrate 
on static equation solvers. The most recent improvements for such problems fall into 
several categories, namely 

i) Element-by-element preconditioners (Hughes et al.^) 

ii) Constrained NR procedures of Padovan and Arechaga 0 


iii) Constrained hierarchical least-squares algorithms of 
Padovan and Lackney^ 


Assuming large deformation kinematics along with potential material 
nonlinearity, the governing FE formulation takes the form^’^ 


F = G + / [B*] T Sdv 


( 1 ) 


where S is the second Piola Kirchoff stress tensor, F is the nodal force vector 
and G is the vector of body forces. Typically (1) is nonlinear and must be solved 
via NR schemes. After expansion into truncated Taylor series, (1) yields the 
following NR algorithm namely 1 9 ^ y ^ ^ 


AG + 


[K i ]A h + i 


= F+AF - /[ B J 
R 1 


T 

S.dv 

~i 


where [K] defines the tangent stiffness matrix, that 


is 


6,7 


( 2 ) 


[K.J = /[[G] T [S.][G] + [B*] T [D T .J[B*J]dv 

1 R 1 LI 1 


( 3 ) 


such that [S.] is the prestress matrix and [D^] is the tangent material stiffness. 

As noted earlier, the solution involves either the use of constrained procedures^ for 
appropriate load increment control or a direct Gaussian-type inversion scheme. 1 

To date such methodologies have been employed either in single processor or 
vector processor machines. The shortcomings of the FEM outlined in the previous 
section are essentially a direct outgrowth of the limitations of the architecture of 
single and vector-processor-type machines. In the next section, the intrinsic 
structure of the INR algorithm will be explored to define new computer architectures 
to bypass such difficulties. 
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5. 


Hierarchical Substructuring 


From a conceptual point of view, the INR scheme defined by (2) does not confine 
the FEM scheme to a particular type of computer configuration. Rather the problems 
of speed and storage are essentially hardware based. Specifically the main questions 
and problems evolve out of the need to define architectures which enable the use of 
multiple processors so as to enhance overall machine speed as well as memory size. 
While the CRAY and CYBER systems are certainly a step in the right direction, they 
fall short of the ultimate requirements. Currently very large-scale FE models can 
easily outstrip the available core storage and machine CPU speeds. 

In seeking to develop new computer architectures one is faced with the fact that 

i) Vector processors require extensive cores as well as complex logic flows 

ii) True parallel processors still await the fruition of properly organized DBM 

Based on the foregoing, this paper seeks to develop what is called a hierarchical 
form of substructural parallelism. Following the pioneering efforts of the NASA 

Q Q ^ 

Langley group ’ specifically, a nonlinear FE simulation, say of the tire, can be 
logically divided into a hierarchy of substructural groups defined by a variety of 
attributes , namely 

i) Material group 

ii) Geometric configuration 

iii) Kinematic behavior 

iv) Boundary conditions 

At the lowest rung of the hierarchy, items (i)-(iv) are employed to define the 
specific local level substructural groups. The choice of the number of first-order 
groups is contingent on: 

i) Minimizing core requirements of local level processors 

ii) Minimizing number of perimeter nodes so that higher order substructural 
groups also have reduced core requirements for associated processors. 

As can be seen, the main thrust is to maintain in core solutions for each local 
substructural processor. 

Noting Figure 5, a given FE simulation can be broken up into a number of sub- 
structural levels. At each level internal nodes are eliminated to enable assembly 
through perimeter nodes. In terras of (2), the NR algorithm and its constrained 
counterpart can be substructured to yield the following first-level algorithms, that 
is : 


AF 


(l,k) 

i+1 


[ K (l, k )j 


~ l+l 


+ AG 


(l,k) 

i+1 


(4) 
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k = 1, 2 ,... .Number of first-level substructure such that 


F (l,k) _ F (l,k) 
- i+1 - i+1 


R< l ' k) 


* T 

[B. ] S.dv 
1-1 


( 5 ) 


AG 


(l,k) 

i+1 


R (1 ’ k) 


[NJ T AF. +1 dv 


( 6 ) 


where ( )d>k) denotes the first level k’"' 1 substructure, ( ) i+ j the (i+l) t ' 1 
iteration, AF^ + * ' the n °dal load increment, [K^] the substructural tangent 

(l (1 k) 

stiffness, AY. * the nodal deflection increment and AG ! the body force 
. . i 

increment. 

To enable assembly into second-order substructural groups, (4) is partitioned 
into internal and perimeter nodes yielding 


4F (1 ' k) 
- i+1 


(AF^’ k ^ 
^ -Pi+l 


> 


(7) 


AY ( }’ k) 
- l+l 


(a h>i+l 


AY* 1 '^ ) 
~Il+l 


( 8 ) 


(l,k) fAr (l,k) 
A ? i+1 " U ?Pi+l 


fd.k) ) 

A ?Ii+l > 


(9) 


[ K d»k)] = 


[K 


Clfkji r k Al,k)i 

PP J lK IP J 


[K ( Jj k) ] T [ K ( J; k) ] 


( 10 ) 


Employing (7 )— ( 10) we obtain the following relationships for the inner and perimeter 
nodes 


AF 


(l,k) , (l.k), AV (l,k) ( 1 , k ) 

D1X! = U Pi J AY Pi+l + Af .P++1 


tPi+1 


tPi+1 


J iPi+l 


(ID 


A yd»k) _ .[ ( 1 »k) ] y(l,k) f (l,k) 

A ili+1 1 Pli J -Pi+1 -Ii+1 


( 12 ) 
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where 


r ( 1 »k) i _ r/l.k), r „(l,k) 1 T r „(l,k) 1 -l r „(l,k) 1 

K PPi J " lK PPi J ' lK PIi J lK IIi J lK PIi j 


(13) 


(l,k) r j^( 1 >k) i r j'C 1 »k) I / .p( l»k) r (l,k) x (l,k) 

A ?Pi+l lK PIi JlK Ii J(A ?Ii+l A ?li+l } + A ? Pi+ i 


(14) 


f (l.k), _ (l,k) r l f (l,k), 
L<pii J - IK Xi J lK pi . j 


(13) 


(l,k) _ f (l,k) -l f (l,k) . (l,kK 
~ Ii+1 ~ tK IIi J (A ?Ii+l A ?Ii+l } 


(16) 


Assembling (11) yields the second-level substructural relationships, namely 


(2,k) _ , (2,k), (2,k) (2,k) 

l i+1 " tK i ] A * i+1 + A ? i+1 


k = 1,2,... Number of second level substructure 

By partitioning (16) into inner and perimeter degrees of freedom we yield the third- 
order substructural relations after the appropriate assembly process. Continuing the 
partitioning and assembly process yields the various higher order substructural 
relations specifically 


AF 


(j ,k) 
i+1 


[K ( |’ k) ] AY ( |^ k) 


+ AG 


( j *k) 
i+1 


(18) 


wherein the associated inner and perimeter partitions take the form 


A p( j *k) 
-Pi+1 


[K (j ’ k) l AY (j ’ k) + Af (j ’ k) 
lK Pi J -Pi+1 + -Pi+1 


(19) 


» k ^ 

-Ii+1 


-rK (j ’ k) l AY (j,k) + Af (j,k) 
1 Pii 1 -Pi+1 + -n+i 


( 20 ) 
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such that 


[K 


Cj.k), 

Pi J 


[K 


(j.k)i 
PPi J 


rrrCj »K) iT|-j-(j ,k) i ”1 (•„( j ,k) 
IKpn J LK ni J IK pii 


( 21 ) 


Af 


( j ,k) 

Pi+1 


r v ( j » k ) 
IK pi . 
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( j »k) 
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](AF 


( j >k) 

•Ii+1 


a p ( J * K ) \ 

A Sli+l J 
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( 22 ) 


[<c 


(j*k) 1 
Pii 1 


fv-(j » k ) l”lr„(j ,k)_ 

LK ni J IK pii J 


(23) 


Af 


(j,k) 

■Ii+1 


r „(j, k ),-l AF (j,k) 
[K Iii J A ? Ii+1 


(24) 


Based on (11)— (24), we see that the overall nonlinear hierarchical subs tracturing 
requires a forward calculation phase as well as a backward stage. The forward phase 
involves the use of (11), (13), (14), (19), (21) and (22). In contrast the backward 
phase, which involves the definition of inner nodes, incorporates the use of (12), 
(15), (16), (20), (23) and (24). In terras of the forward iterative algorithms, the 
overall required machine architecture takes the form defined in Figure 6. Note the 
common data buses linking successive subs tructural levels need only provide access to 
perimeter data. In this way, significantly less data need to be accessed by the 
global-level DBM. This applies throughout the forward phase of the iteration 
process. Overall the steps handled by each of the succeeding levels involve 
assembly, inner/perimeter partitioning, and setting up effective stiffnesses for the 
forward and backward phases. In terras of (21)-(24), the stiffnesses associated with 
the perimeter and inner nodes involve an inverse of the inner partition of the k* 1 
substructural stiffness. All such manipulations must be performed by processors 
dedicated to each of the k individual substructures associated with the various 
hierarchical levels . 

Once the forward loop of calculations is complete, the perimeter data must be 
back tracked to the inner nodes of each of the various substructures at the different 
substructural levels. The overall flow of control/calculation is depicted in 
Figure 7. As can be seen, the perimeter data are used to determine the inner nodal 
incremental excursions. This is achieved through the use of the family of expres- 
sions defined by (20). Once the back substitutions to the succeeding levels up to 
and including the first are completed, the standard norm type convergence checks must 
be implemented to ascertain the quality of convergence. Contingent on the conver- 
gence check, the iteration process can be cycled through the forward and backward 
phases of the substructural hierarchy. 
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6 . 


Discussion 


To illustrate the hierarchical substructural scheme, consider the three-level 
simulation defined in Figure 8. The number of nodes and substructure associated with 
the example are given in Figure 9. Based on the number of inner and perimeter 
variables depicted, the expressions defining the number of respective nodes are given 
by: 


i) Level 1 

Perimeter Nodes = 2(fc^ + + £ 3 ) (25) 

Inner Nodes = (tj - 2 )(£. 2 “ 2) (26) 

ii) Level 2 

Perimeter Nodes = 2(£^xij + i2 n 2 “ n i “ n 2 ^ (27) 

Inner Nodes = n^^^l + ^2^ “ n l^l + 2) - n2(&2 + 2) - 3 n^n 2 + 1 (28) 

iii) Level 3 

Perimeter Nodes ■ 2m 1 n 1 (il 1 - 1) + 211 ^ 2(^2 " 1) (29) 

Inner Nodes = £i ra i n i( m 2 - 1) + £2 n 2 m 2^ m l " ^ ~ m l m 2^ n l + n 2 ) 

+ + m 2 n 2 ” m i m 2 + 1 (30) 


Employing (25-30) we see that the storage effectiveness of each of the various levels 
is expressed by the relations 


5 


( 1 ) 


Per im eter 
Perimeter + Inner 


(31) 


where k denotes the level number. In the context of (31), it follows that 


CD 2U t + - 2) 

' ' * 1*2 


(32) 


.(2) _ 2( -*l n l + *2 n 2 n j n 2^ 


n l*l^ n 2 + U + ^2 n 7^ n l + 1) " 311^2 - 4 ( 11 ^ + n 2 ) + 1 


(33) 


^ 2m 1 n 1 (fc 1 - 1) + 2m 2 n 2 (^2 ” 1 ) 


+ 1) + + D - m + n ? + 1) - + 1 


(34) 
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Consider the case wherein 


£j = 100, l 2 = 50 
n l = 5, n 2 = 4 
= 3, = 4 

In terras of the foregoing, Table 2 gives the total number of 

• degrees of freedom 

• processors required at each level 

• perimeter/inner nodes 

as well as the storage effectiveness of each of the substructural levels. Noting 
that a straight solution of the given problem would require a 1.2 x 10° order 
stiffness matrix, it follows from Table 2 that very significant storage savings as 
well as speed enhancements can be achieved. 

In the context of the foregong development, it follows that hierarchical sub- 
structural parallelism has decided advantages over vector-type processors, namely: 

i) Global common core is reduced in size 

ii) Substructures are handled in smaller local cores which could employ 
vector processors and which are controlled by local DBM 

iii) Data transfer between succeeding levels of substructural hierarchy 
are reduced thereby reducing load on DBM 

iv) Various substructures are updated, inverted, and assembled 
simultaneously hence emhancing the overall speed 

v) The overall addressing requirements are reduced since the size of 
individual substructural zones is much smaller 

iv) Extensive use of cash memory (Ram Disk) can be made at the local 
level thereby reducing disk I/O 

vii) Backward and forward steps follow natural f ormulational lines 

viii) Element-to-element or hierarchical least-squares algorithms can 
be employed at the local substructural level 

ix) Linear/nonlinear problem partitioning can be more logically handled 

x) Overall control of the machine is more logical and less difficult 
since local processors are essentially autonomous within updating 
and inverting phases of the operation 
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xi) The MCL can be patterned about well-defined substructuring methodology; the 
transfer of control from level to level is contingent on the 
monitoring status of sti f f ness/invers Lon calculations 

xii) The data base manager needs only to deal with data residing on perimeters 

of the substructure; as noted earlier, this significantly reduces the amount 
of data transferred between levels. 

As discussed earlier, the modeling of tires in their use environment represents 
perhaps one of the most comprehensive single component nonlinear structural response 
problems currently available. This follows from the fact that geometric-, material-, 
and boundary-induced nonlinearity all simultaneously act to define the global 
response behavior. Due to their regionalized/ substructural form of construction, 
tires represent a good modeling problem to help define the architecture of high-level 
multiprocessor machines. In this context, a hierarchical form of substructural 
parallelism has decided advantages over other forms of multiprocessors. As has been 
seen such a procedure has several theoretical advantages for nonlinear problems. 

These evolve about the simplified DBM structure, reduced data flow, smaller global 
core, and reduced addressing requirements. 

Overall future work in this area should 

• Place qreater emphasis on algorithmic architecture and its possible effects on 
machine structure 

• Establish proper control configuration for hierarchical DBM 

• Extend scheme to constrained incremental Newton/Raphson (INR) least-squares 
algorithms as well as transient schemes 

• Apply concept to available parallel processors 

• Structure procedure so as to enable either direct or iterative solutions at 
substructural level 

• Establish criteria to enable determination of quality of convergence at 
local substructural level 
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TABLE 1 COMPARISON OF LINEAR AND NONLINEAR FE SIMULATION OF PRESSURIZED TIRE 



TABLE 2 COMPARISON OF HIERARCHICAL SUBSTRUCTURAL PARALLEL AND SINGLE 

PROCESSOR SYSTEMS 



*D of F/P - degrees of freedom per processor 

JL 

Total number of perimeter and inner D of F 
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Figure 5 Substructural zones of tire. 
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Figure 6 Flow of control: forward loop 





























